Inferring true dual-port, dual-clock RAMs in Xilinx and Altera FPGAs

Yes, it’s actually possible! – in Verilog and VHDL, even.

I’m a big fan of inference, especially as it applies to writing synthesizable Verilog code for FPGAs. Properly coded, a module that infers technology-dependent blocks (e.g. block RAMs) should: be portable between devices from a particular vendor (e.g. Spartan 3E to Virtex 6), be portable between devices from different vendors (e.g. Spartan 6 to Cyclone III), and even be portable to vendor-independent environments (e.g. simulation in Icarus Verilog).

The trick is that little “properly coded” clause. Figuring out exactly the right sort of Verilog required to get a particular tool to infer the block you want isn’t always straight-forward. Figuring out exactly the right sort of Verilog to get multiple tools to infer the block you want can be even trickier. Which is, perhaps, a little bit silly – considering that the whole point behind this little exercise is to be able to write code that isn’t tied to any particular tool, device or vendor!

Using current synthesis tools from Xilinx (ISE WebPack 12.2) and Altera (Quartus II Web Edition 10.0 SP1), it’s now practical to write synthesizable device and vendor-independent Verilog code (or VHDL, if that’s your thing) that properly infers true dual-port (TDP), dual-clock block RAMs in each vendor’s respective FPGAs. There are, of course, a variety of limitations and caveats that come along with that statement.

Limitations

After:

  • reviewing each vendor’s pertinent documentation: the “RAMs Hardware Description Language (HDL) Coding Guidelines” from Xilinx’s XST User Guide for Virtex-6 and Spartan-6 Devices (the pre-6 XST guide doesn’t cover any of XST’s more advanced inference capabilities), and the “Inferring Memory Functions from HDL Code” section of Altera’s Recommended HDL Coding Styles;
  • reviewing each vendor’s language templates: Edit->Language Templates in ISE’s GUI, followed by drilling down to {Verilog,VHDL}->Synthesis Constructs->Coding Examples->RAM->BlockRAM; and Edit->Insert Template in Quartus’ GUI, followed by {Verilog,VHDL,SystemVerilog}->Full Designs->RAMs and ROMs;
  • and performing a few non-exhaustive synthesis experiments of my own,

..a few interesting differences can be found between Xilinx’s and Altera’s respective abilities to infer memories:

  • Both Xilinx and Altera support inferring dual-port RAMs with mixed-width ports (e.g. Port A views a memory as 512×36, while Port B views the exact same memory as 1024×18). Xilinx supports this in VHDL and Verilog. Altera supports this in VHDL and SystemVerilog (but not plain Verilog). Unfortunately, neither uses constructs that the other can understand.
  • Both Xilinx and Altera support inferring byte-enables, to a certain extent. Xilinx only supports byte-enables on single-port memories. Altera supports byte-enables on both simple and true dual-port memories. Again: Xilinx supports VHDL and Verilog, Altera supports VHDL and SystemVerilog, and both use mutually incompatible constructs.
  • Xilinx has an additional quirk relating to inferring byte-enables. XST supports 2 different coding styles: one is newer, and recommended for Virtex-6/Spartan-6 series devices (but doesn’t work on older devices); the other is older and works on all devices (but doesn’t support inferring write-first behavior).
  • Both Xilinx and Altera support specifying the initial contents of inferred RAMs and ROMs using Verilog initial blocks with $readmemh statements. Not a difference, but kinda neat (this will work in simulation, too).
  • Xilinx supports inferring clock-enables on all BRAM types. Altera only seems to support this on simpler forms of memories. Trying to place clock-enables on a true dual-port memory yields the distinctly unhelpful message “RAM logic is uninferred due to asynchronous read logic”.
  • Xilinx supports inferring all 3 read/write synchronization behaviors: write-first, read-first and no-change; or “new data” and “old data” read-during-write behavior in Altera parlance (Altera doesn’t explicitly call out a no-change analogue, but does support a read-enable signal that can achieve the same behavior). Altera seems to only support inferring write-first behavior on dual-clock RAMs, but they do support inferring both behaviors on single-clock true dual-port RAMs.

Anecdotally speaking, Xilinx’s tools seemed better able to cope with variations on coding styles than Altera’s, and had far more consistent behavior across different types of memories (e.g. clock-enables are always supported). Both Xilinx and Altera could really stand to work on arriving at inference templates that are actually interoperable.

Anyway. The end result of all that is a notion of what currently constitutes the least-common-denominator for features of an inferred true dual-port, dual clock RAM:

  • Two ports, each with:
    • clock
    • address
    • write enable
    • write data
    • read data
  • Write-first (“new data”) behavior
  • No clock enables
  • No byte enables
  • No asymmetry (no mismatched port widths)

Verilog Implementation

Writing the Verilog to implement that subset in a parameterized fashion is trivial:

// A parameterized, inferable, true dual-port, dual-clock block RAM in Verilog.

module bram_tdp #(
    parameter DATA = 72,
    parameter ADDR = 10
) (
    // Port A
    input   wire                a_clk,
    input   wire                a_wr,
    input   wire    [ADDR-1:0]  a_addr,
    input   wire    [DATA-1:0]  a_din,
    output  reg     [DATA-1:0]  a_dout,
    
    // Port B
    input   wire                b_clk,
    input   wire                b_wr,
    input   wire    [ADDR-1:0]  b_addr,
    input   wire    [DATA-1:0]  b_din,
    output  reg     [DATA-1:0]  b_dout
);

// Shared memory
reg [DATA-1:0] mem [(2**ADDR)-1:0];

// Port A
always @(posedge a_clk) begin
    a_dout      <= mem[a_addr];
    if(a_wr) begin
        a_dout      <= a_din;
        mem[a_addr] <= a_din;
    end
end

// Port B
always @(posedge b_clk) begin
    b_dout      <= mem[b_addr];
    if(b_wr) begin
        b_dout      <= b_din;
        mem[b_addr] <= b_din;
    end
end

endmodule

VHDL Implementation

Likewise, in VHDL:

-- A parameterized, inferable, true dual-port, dual-clock block RAM in VHDL.

library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_unsigned.all;

entity bram_tdp is
generic (
    DATA    : integer := 72;
    ADDR    : integer := 10
);
port (
    -- Port A
    a_clk   : in  std_logic;
    a_wr    : in  std_logic;
    a_addr  : in  std_logic_vector(ADDR-1 downto 0);
    a_din   : in  std_logic_vector(DATA-1 downto 0);
    a_dout  : out std_logic_vector(DATA-1 downto 0);
    
    -- Port B
    b_clk   : in  std_logic;
    b_wr    : in  std_logic;
    b_addr  : in  std_logic_vector(ADDR-1 downto 0);
    b_din   : in  std_logic_vector(DATA-1 downto 0);
    b_dout  : out std_logic_vector(DATA-1 downto 0)
);
end bram_tdp;

architecture rtl of bram_tdp is
    -- Shared memory
    type mem_type is array ( (2**ADDR)-1 downto 0 ) of std_logic_vector(DATA-1 downto 0);
    shared variable mem : mem_type;
begin

-- Port A
process(a_clk)
begin
    if(a_clk'event and a_clk='1') then
        if(a_wr='1') then
            mem(conv_integer(a_addr)) := a_din;
        end if;
        a_dout <= mem(conv_integer(a_addr));
    end if;
end process;

-- Port B
process(b_clk)
begin
    if(b_clk'event and b_clk='1') then
        if(b_wr='1') then
            mem(conv_integer(b_addr)) := b_din;
        end if;
        b_dout <= mem(conv_integer(b_addr));
    end if;
end process;

end rtl;

Synthesis

Running the Verilog code through XST yields:

Synthesizing (advanced) Unit <bram_tdp>.
INFO:Xst:3040 - The RAM <Mram_mem> will be implemented as a BLOCK RAM, absorbing the following register(s): <a_dout> <b_dout>
    -----------------------------------------------------------------------
    | ram_type           | Block                               |          |
    -----------------------------------------------------------------------
    | Port A                                                              |
    |     aspect ratio   | 1024-word x 72-bit                  |          |
    |     mode           | write-first                         |          |
    |     clkA           | connected to signal <a_clk>         | rise     |
    |     weA            | connected to signal <a_wr>          | high     |
    |     addrA          | connected to signal <a_addr>        |          |
    |     diA            | connected to signal <a_din>         |          |
    |     doA            | connected to signal <a_dout>        |          |
    -----------------------------------------------------------------------
    | optimization       | speed                               |          |
    -----------------------------------------------------------------------
    | Port B                                                              |
    |     aspect ratio   | 1024-word x 72-bit                  |          |
    |     mode           | write-first                         |          |
    |     clkB           | connected to signal <b_clk>         | rise     |
    |     weB            | connected to signal <b_wr>          | high     |
    |     addrB          | connected to signal <b_addr>        |          |
    |     diB            | connected to signal <b_din>         |          |
    |     doB            | connected to signal <b_dout>        |          |
    -----------------------------------------------------------------------
    | optimization       | speed                               |          |
    -----------------------------------------------------------------------
Unit <bram_tdp> synthesized (advanced).

And, through Quartus:

Warning: Inferred dual-clock RAM node "mem~0" from synchronous design logic.  The read-during-write behavior of a dual-clock RAM is undefined and may not match the behavior of the original design.
Info: Inferred 1 megafunctions from design logic
    Info: Inferred altsyncram megafunction from the following design logic: "mem~0" 
        Info: Parameter OPERATION_MODE set to BIDIR_DUAL_PORT
        Info: Parameter WIDTH_A set to 72
        Info: Parameter WIDTH_B set to 72
        Info: Parameter WIDTHAD_A set to 10
        Info: Parameter WIDTHAD_B set to 10
        Info: Parameter NUMWORDS_A set to 1024
        Info: Parameter NUMWORDS_B set to 1024
        Info: Parameter OUTDATA_REG_A set to UNREGISTERED
        Info: Parameter OUTDATA_REG_B set to UNREGISTERED
        Info: Parameter ADDRESS_REG_B set to CLOCK1
        Info: Parameter INDATA_REG_B set to CLOCK1
        Info: Parameter WRCONTROL_WRADDRESS_REG_B set to CLOCK1
        Info: Parameter INDATA_ACLR_A set to NONE
        Info: Parameter WRCONTROL_ACLR_A set to NONE
        Info: Parameter ADDRESS_ACLR_A set to NONE
        Info: Parameter ADDRESS_ACLR_B set to NONE
        Info: Parameter OUTDATA_ACLR_B set to NONE

Mission accomplished! – right? Not quite. Not unequivocally, anyway. There are yet more caveats..

Simulation mismatches

I hope you caught that little warning that Quartus issued. It brings up a very important caveat with respect to inferring RAMs: you must be very careful to avoid simulation/synthesis mismatches! For dual-clock memories, when one asynchronous clock domain attempts to write to the same address as the other clock domain is presently reading (or, worse: also writing), the behavior is typically unpredictable in both Xilinx and Altera FPGAs.

Have a look at this excerpt from the “Conflict Avoidance” section of Xilinx’s Spartan-6 Block RAM User Guide:

Asynchronous clocking is the more general case, where the active edges of both clocks do not occur simultaneously:

  • There are no timing restrictions when both ports perform a read operation.
  • When one port performs a write operation, the other port must not read- or write- access the same memory location. The simulation model will produce an error if this condition is violated. If this restriction is ignored, a read or write operation will produce unpredictable results. There is, however, no risk of physical damage to the device. If a read and write operation is performed, then the write will store valid data at the write location.
  • In READ_FIRST mode only, the dual-port block RAM has the additional restriction that addresses for port A and B cannot collide. This applies for both TDP and SDP modes. A read/write on one port and a write operation from the other port at the same address is not allowed. This restriction on the operation of the block RAM should not be ignored.
    • RAMB16BWER when both ports are 18 bits wide or smaller: A13–A6, including A4, cannot be the same.
    • RAMB16BWER when any one port is 36 bits wide: A13–A7, including A5, cannot be the same.
    • RAMB8BWER in all configurations: A12–A6 including A4 cannot be the same.

Yeah. Really. A simplistic, idealized model can’t even begin to describe that sort of behavior. So, if there is any risk that your design may attempt to access the same (or similar) address simultaneously from two different clock-domains (with at least one of those accesses being a write), you’ll need to take extra precautions to ensure your simulations accurately verify the design’s behavior. Two possible options would be: attempting to upgrade the inferable model to include simulation checks for this behavior, or reverting to using manually-instantiated device primitives.

I wasn’t immediately able to find anything in Altera’s documentation that was quite as restrictive as Xilinx’s above notes about read-first address collisions, but I would not be surprised if Altera had similar restrictions. In any case, it’s still worth reading over the “Read-During-Write” section of Altera’s Internal Memory (RAM and ROM) User Guide, and the “Read-During-Write Operations” section of the handbook for a particular device family (e.g. the Memory Blocks in Cyclone III Devices document). For example, Altera specifies the undefined mixed-port read-during-write behavior thusly:

For mixed-port read-during-write operation with dual clocks, the relationship between the clocks determines the output behavior of the memory. If you use the same clock for the two clocks, the output is the old data from the address location. However, if you use different clocks, the output is unknown during the mixed-port read-during-write operation. This unknown value may be the old or new data at the address location, depending on whether the read happens before or after the write.

It’s interesting to note that Altera actually indicates that “the unknown value may be the old or new data at the address location,” rather than merely leaving it as unknown/invalid/unpredictable (as Xilinx does). Conceivably, this is behavior that might be useful in a real design (maybe not a good design, but a design nonetheless). Without more concrete assurances from Xilinx, however, you wouldn’t want to rely on this behavior.

Unsurprisingly, neither Xilinx nor Altera has defined behavior for simultaneous writes to the same address from two ports. Altera, for example, states this under “Conflict Resolution”:

When you are using M9K memory blocks in true dual-port mode, it is possible to attempt two write operations to the same memory location (address). Because there is no conflict resolution circuitry built into M9K memory blocks, this results in unknown data being written to that location. Therefore, you must implement conflict-resolution logic external to the M9K memory block.

Unexpected behavior

Mismatches aren’t limited to obscure corner-cases in simulation. Take this (seemingly) relatively-mundane simple dual-port memory, for example:

module relatively_mundane_sdp_ram #(
    parameter DATA = 72,
    parameter ADDR = 10
) (
    // Shared
    input   wire                clk,

    // Write Port
    input   wire                wr,
    input   wire    [ADDR-1:0]  wr_addr,
    input   wire    [DATA-1:0]  wr_data,

    // Read Port
    input   wire    [ADDR-1:0]  rd_addr,
    output  wire    [DATA-1:0]  rd_data
);

// Shared memory
reg [DATA-1:0] mem [(2**ADDR)-1:0];

// Write Port
always @(posedge clk)
    if(wr)
        mem[wr_addr] <= wr_data;

// Read Port
reg [ADDR-1:0] rd_addr_reg;
always @(posedge clk)
    rd_addr_reg <= rd_addr;

assign rd_data = mem[rd_addr_reg];

endmodule

The intent with that code is to create a memory in which writes will fall-through to the read port if the write and read addresses match. This would, for instance, be used in creating a low-latency synchronous FIFO with first-word-fall-through (FWFT); data pushed to an empty FIFO would be visible on the pop port on the following cycle. This exact behavior isn’t directly supported by Xilinx’s or Altera’s block RAMs.

Nonetheless, if you try to run this code through XST, it will happily infer a BRAM:

Synthesizing (advanced) Unit <relatively_mundane_sdp_ram>.
INFO:Xst:3040 - The RAM <Mram_mem> will be implemented as a BLOCK RAM, absorbing the following register(s): <rd_addr_reg>
    -----------------------------------------------------------------------
    | ram_type           | Block                               |          |
    -----------------------------------------------------------------------
    | Port A                                                              |
    |     aspect ratio   | 1024-word x 72-bit                  |          |
    |     mode           | write-first                         |          |
    |     clkA           | connected to signal <clk>           | rise     |
    |     weA            | connected to signal <wr>            | high     |
    |     addrA          | connected to signal <wr_addr>       |          |
    |     diA            | connected to signal <wr_data>       |          |
    -----------------------------------------------------------------------
    | optimization       | speed                               |          |
    -----------------------------------------------------------------------
    | Port B                                                              |
    |     aspect ratio   | 1024-word x 72-bit                  |          |
    |     mode           | write-first                         |          |
    |     clkB           | connected to signal <clk>           | rise     |
    |     addrB          | connected to signal <rd_addr>       |          |
    |     doB            | connected to signal <rd_data>       |          |
    -----------------------------------------------------------------------
    | optimization       | speed                               |          |
    -----------------------------------------------------------------------
Unit <relatively_mundane_sdp_ram> synthesized (advanced).

This won’t work! Again, an excerpt from Xilinx’s Block RAM User Guide (this time, from the “Synchronous Clocking” subsection of the “Conflict Avoidance” section):

When one port performs a write operation, the write operation succeeds; the other port can reliably read data from the same location if the write port is in READ_FIRST mode. DATA_OUT on both ports will then reflect the previously stored data.
If the write port is in either WRITE_FIRST or in NO_CHANGE mode, then the DATA_OUT on the read port would become invalid (unreliable). The mode setting of the read-port does not affect this operation.

Invalid won’t do. We want the read port’s data to reflect the newly written data from the write port!

To Altera’s credit, Quartus does correctly implement this behavior. Since it can’t be mapped directly to a RAM, Quartus adds additional bypass logic (using extra logic-elements in the process). It even tells you that it’s doing this:

Warning: Inferred RAM node "mem~0" from synchronous design logic.  Pass-through logic has been added to match the read-during-write behavior of the original design.
Info: Inferred 1 megafunctions from design logic
    Info: Inferred altsyncram megafunction from the following design logic: "mem~0" 
        Info: Parameter OPERATION_MODE set to DUAL_PORT
        Info: Parameter WIDTH_A set to 72
        Info: Parameter WIDTHAD_A set to 10
        Info: Parameter NUMWORDS_A set to 1024
        Info: Parameter WIDTH_B set to 72
        Info: Parameter WIDTHAD_B set to 10
        Info: Parameter NUMWORDS_B set to 1024
        Info: Parameter ADDRESS_ACLR_A set to NONE
        Info: Parameter OUTDATA_REG_B set to UNREGISTERED
        Info: Parameter ADDRESS_ACLR_B set to NONE
        Info: Parameter OUTDATA_ACLR_B set to NONE
        Info: Parameter ADDRESS_REG_B set to CLOCK0
        Info: Parameter INDATA_ACLR_A set to NONE
        Info: Parameter WRCONTROL_ACLR_A set to NONE

Conclusion

Well, now, that was easy! Certainly vastly simpler than manually instantiating device primitives – right?

So maybe inference isn’t an entirely ideal solution yet, but if you have a reasonably well-behaved design (one that doesn’t perform simultaneous reads and writes of identical memory locations) that doesn’t require anything too exotic (as if clock-enables could be considered exotic) and needs to be easily portable between vendors, then you might just give it a try. Otherwise, you may have to stick to your RAMB16s and ALTSYNCRAMs for a while longer.

A closing disclaimer: though the information presented here may appear authoritative and exhaustive (or, at least, exhausting), I must stress that these are merely my preliminary findings. It’s quite possible that I’ve overlooked or misinterpreted something (especially on the Altera side – I’m typically more of a Xilinx guy). I was focused on creating an inferable true dual-port, dual-clock memory, and did not run many tests of other forms of RAMs. When in doubt, try it out yourself!

This entry was posted in FPGAs, Technical and tagged , , , , , . Bookmark the permalink.

20 Responses to Inferring true dual-port, dual-clock RAMs in Xilinx and Altera FPGAs

  1. NICE! I’m going to try and add this to my emulator right now. I want to see how big of an altera device my emulator would require so this will definitely help.

    Good work!

  2. I’ve been avoiding this sort of thing for exactly the reasons you guess I have! :)

    Nice work- thanks!

  3. tony says:

    thanks for your usefull work Dan,

    historically Altera memory access was no truly dual port, does this has changed in the latest generations?

    • Dan says:

      Altera has true dual-port memories in all of their contemporary devices (at least since the first Cyclone and Stratix parts – I didn’t check anything older than that).

      Their smallest RAMs (e.g. M512 and MLABs) don’t support this (similar to Xilinx’s distributed/LUT RAM).

      There are some width limitations when using true dual-port mode as well. For example, with an M9K in simple dual-port mode, you can have a 36-bit read port and a separate 36-bit write port. But, in true dual-port mode, you’re limited to 18-bits on each of the two read/write ports.

  4. HY says:

    Do you think it is possible to make a 3-port RAM? Our design requires 2 reads and 1 write in 1 clock cycle.

    • Dan says:

      Sure! In general, if you have a simple-dual-port RAM, you can easily create a RAM with a single write port and an arbitrary number of read ports. For N read ports, create N SDP RAMs. Tie all of their write ports together (so each RAM always has the same data), and then use the N read ports independently.

      Xilinx’s distributed RAM blocks can do this pretty efficiently (assuming you’re implementing a small register file of sorts). For example, on a Spartan-6 or Virtex-6, each slice-M has 4 LUTs. 1 LUT can be used as a read/write port, with the other 3 being read ports. This lets you create a quad-ported 64×1-bit or 32×2-bit distributed RAM in a single slice (use more slices for wider memories). Xilinx has macros for this (e.g. RAM64X1Q or RAM32X2Q), but inference should work too.

      This infers what you want in ISE (not sure about other tools/vendors):

      module regfile #(
          parameter ADDR = 6,
          parameter DATA = 32
      ) (
          input   wire    clk,
          
          // write port
          input   wire                wr_en,
          input   wire    [ADDR-1:0]  wr_addr,
          input   wire    [DATA-1:0]  wr_data,
      
          // read port 0
          input   wire    [ADDR-1:0]  rd0_addr,
          output  reg     [DATA-1:0]  rd0_data,
          
          // read port 1
          input   wire    [ADDR-1:0]  rd1_addr,
          output  reg     [DATA-1:0]  rd1_data
      );
      
      (* ram_style = "distributed" *) reg [DATA-1:0] mem[(2**ADDR)-1:0];
      
      always @(posedge clk) begin
          rd0_data    <= mem[rd0_addr];
          rd1_data    <= mem[rd1_addr];
          if(wr_en) begin
              mem[wr_addr]    <= wr_data;
          end
      end
      
      endmodule
      

      And produces this synthesis result:

      Macro Statistics
      # RAMs                                                 : 1
       64x32-bit quad-port distributed RAM                   : 1
      # Registers                                            : 64
       Flip-Flops                                            : 64
      

      And does, indeed, use 1 slice per bit:

      Slice Logic Distribution:
        Number of occupied Slices:                    32 out of   3,758    1%
      

      I’ll note that, at least in ISE 12.3, I ran into some trouble in getting the tools to infer an efficient RAM32X2Q using this code (it uses the same number of LUTs and slices for ADDR = 5 or 6). More tweaking may be required, or manual instantiation of the desired primitive.

      • Azhar says:

        I think, if one needed to use BRAMs then you could tie the writes and associated signals to 2 two port rams’ write ports. Both will have the same content, and you could read two different addresses simultaneously.

      • Azhar says:

        Sorry, just realized that that is exactly what you were alluding to in your comment too :)

  5. Balaji says:

    Hi,

    If you can tell me how to make Altera stop inferring Megafunctions it would be really helpful.
    I have written the most simplest code on the planet and its still not working…
    I hate it when the tool does automatically something and stops you from reaching your goal.

    Please do reach me at my mail id.

    Thanks,
    Balaji

    • Dan says:

      I’m not familiar enough with Altera tools to give you a specific answer. Xilinx, at least, offers a couple of ways to control/disable inference of device primitives.

      With Xilinx’s tools, you can globally control inference for various classes of primitives (block RAMs, multipliers, shift-registers, etc.) through synthesis options. For more fine-grained control, you can embed constraints in your source code (e.g. with Verilog metacomments) that control inference at a module or signal level (I use this approach quite frequently). Constraints can also typically be specified in a UCF file. Xilinx spells all of this out in great detail in their XST User Guide and their Constraints Guide.

      A quick look at the “Megafunction Inference Options” section in Chapter 13 of Volume 1 of the Quartus II Handbook (Quartus II Integrated Synthesis) confirms that Altera does, in fact, have something analogous (both for global and fine-grained control).

  6. gopal says:

    how to program the above vhdl code in spartan 6 (sp601) with its proper ucf files , and it would be gr8 if u could throw some light on how to program it using core generator

  7. nic says:

    I’m trying to infer a synchronous dual-port byte-enabled write RAM. You mentioned this is possible, but for the life of me, I can’t get the synthesizer to do it. I can get it to work for single port RAMs but not dual port. Have you done this before?

  8. Hagen says:

    Excellent work, Dan. I was hooking up a VGA module to the JupiterACE from ZXGATE on the Papilio, got it to work but still have some sync problems — dual ported RAM will likely solve the issue. It’s nice to see that it’s possible to infer the BRAMs in a portable way, and also to read about some of the caveats.. Many thanks for the hours you put into this, and even more thanks for actually sharing it! :-)

  9. yogesh says:

    How can one use “reset” implementation using the shared variable to clear the memory contents for the dual port?

  10. Christoph says:

    Writing your code independent of tools and vendors is a very good idea. As you correclty say (and show) it is not so easy.
    It’s nice to see that you found a way so it works for this example with plain hdl code.

    For more than one reason I prefer a different aproach to achieve vendor independence. I can only wirite VHDL, so I can speak only for that. I use wrappers around everything that is vendor specific. In a function block using some sort of memory for example, I would model a generic component for the memory and instantiate that in my function block. The entity of this generic component contains only the needed signals for this specific function block (makes live easier later on).

    Then I write the code for this component. This code contains some constants describing the FPGA type that is used. If you change your vendor or familiy you only have to change this constant!

    In VHDL I use the concurrent statement “if generate” to select the proper vendor specific code to instantiate vendor specific building blocks. This way I make sure that I get always the building blocks I have in mind (independent from the synthesizer) and if needed I can add easily glue logic to make the building blocks “compatible” to each other.
    For every FPGA type supportet there is one “if … generate” block, so it is easy to extend it in future without touching something that is tested and working.

    The last part missing is some automatic possibility to ask what kind of FPGA the synthesizer is targeting (As you normaly do with a C compiler where you can set preprocessor variables on the comand line and then use #ifdef DEBUG or #ifdef LINUX in your code for example).

  11. Nathan Clarke says:

    Dan,

    Great article. I find it useful to infer (rather than instantiate) memories not only for portability, but also for reconfigurability based on parameters/generics (with some coarse constraints on what these parameters may be, e.g. memory depths being powers of two).

    This may be a little off topic, but I am a bit confused as to why the write-first mode tends to be used more than read-first. In most cases I don’t care about the write mode, either because I’m using a single port BRAM that is only ever reading or writing at a given time, or because I’m using a simple dual-port where the write port only ever writes. In these cases I tend to arbitrarily choose read-first mode, thinking that this may produce faster reads (based on the notion that the BRAM does not have to additionally select data from the DIN input), but I found a tip in Xilinx WP231 “HDL Coding Practices to Accelerate Design Performance” that suggests the opposite is true:

    Avoid “read before write” mode to achieve maximum block RAM performance.

    I haven’t managed to find any explanation for this tip. Can you offer any thoughts or comments on this? In cases where the write mode does not matter, which mode is best to default to? Does it matter?

  12. Charan says:

    Hi Dan,

    Is it possible to generate true dual port ram with byte-write enable ?

  13. Syeda Anisa Gohar says:

    Wow.. this is what i was looking for.. but i am afraid that it renders the BRAM exactly same as you specified but when I write memory in another module but read from same address in another module in a small program just to test it because i need to do same in a BIG project. Its not giving me the same result. why? Can you please tell me the solution?

  14. Carl W says:

    Nice and complete post! nic, are you describing your two ports in separate processes? This is necessary for XST to infer it as dual-port. (I’ve got a post on infering dual-port RAM with XST as well, welcome to check it out.)

  15. Pingback: how to connect External memory to altera fpga - Page 2

Leave a comment